Efficient Exploration through Bayesian Deep Q-Networks

نویسندگان

Kamyar Azizzadenesheli

Emma Brunskill

Anima Anandkumar

چکیده

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) model. This layer can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. We apply our method to a wide range of Atari games in Arcade Learning Environments. Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster than a key baseline, the double deep Q network (DDQN).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Deep Q-Learning via Continuous-Time Flows

Efficient exploration in reinforcement learning (RL) can be achieved by incorporating uncertainty into model predictions. Bayesian deep Q-learning provides a principle way for this by modeling Q-values as probability distributions. We propose an efficient algorithm for Bayesian deep Q-learning by posterior sampling actions in the Q-function via continuous-time flows (CTFs), achieving efficient ...

متن کامل

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on ε-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such...

متن کامل

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as -greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additiona...

متن کامل

Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

Exploration in an unknown environment is the core functionality for mobile robots. Learning-based exploration methods, including convolutional neural networks, provide excellent strategies without human-designed logic for the feature extraction [1]. But the conventional supervised learning algorithms cost lots of efforts on the labeling work of datasets inevitably. Scenes not included in the tr...

متن کامل

Efficient exploration with Double Uncertain Value Networks

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.04412 شماره

صفحات -

تاریخ انتشار 2018

Efficient Exploration through Bayesian Deep Q-Networks

نویسندگان

چکیده

منابع مشابه

Bayesian Deep Q-Learning via Continuous-Time Flows

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

Efficient exploration with Double Uncertain Value Networks

عنوان ژورنال:

اشتراک گذاری